{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 前言-lightgbm是什么？\n",
    "\n",
    "LightGBM 是一个梯度 boosting 框架, 使用基于学习算法的决策树. 它是分布式的, 高效的, 装逼的, 它具有以下优势:\n",
    "\n",
    " - 速度和内存使用的优化\n",
    "   - 减少分割增益的计算量\n",
    "   - 通过直方图的相减来进行进一步的加速\n",
    "   - 减少内存的使用 减少并行学习的通信代价\n",
    " - 稀疏优化\n",
    " - 准确率的优化\n",
    "   - Leaf-wise (Best-first) 的决策树生长策略\n",
    "   - 类别特征值的最优分割\n",
    " - 网络通信的优化\n",
    " - 并行学习的优化\n",
    "   - 特征并行\n",
    "   - 数据并行\n",
    "   - 投票并行\n",
    " - GPU 支持可处理大规模数据\n",
    "\n",
    "## 前言-这是什么？\n",
    "\n",
    "这是强化版本的lightgbm的Python用户指南，由FontTian个人在Lightgbm官方文档的基础上改写，旨在能够更快的让lightgbm的学习者学会在python中使用lightgbm，类似文章可以参考[在Python中使用XGBoost](https://blog.csdn.net/FontThrone/article/details/85046810)\n",
    "\n",
    "相关参考请看最后\n",
    "\n",
    "## 引用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:42.754763Z",
     "start_time": "2018-12-18T04:07:41.922338Z"
    }
   },
   "outputs": [],
   "source": [
    "import lightgbm as lgb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据格式\n",
    "\n",
    "LightGBM Python 版本的模型能够从以下格式中加载数据:\n",
    "\n",
    " - libsvm/tsv/csv/txt format file\n",
    " - NumPy 2D array(s), pandas DataFrame, SciPy sparse matrix\n",
    " - LightGBM binary file\n",
    " \n",
    "各种格式我们这里不在太多叙述，详细请参考[原文档](https://lightgbm.readthedocs.io/en/latest/Python-Intro.html)\n",
    "\n",
    "以下示例代码是本次所使用的，具体的数据请前往[github](https://github.com/FontTian/hyperopt-doc-zh/tree/master/tutorials/data)下载。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:43.091752Z",
     "start_time": "2018-12-18T04:07:43.084272Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "def GetNewDataByPandas():\n",
    "    wine = pd.read_csv(\"/home/fonttian/Data/UCI/wine/wine.csv\")\n",
    "    wine['alcohol**2'] = pow(wine[\"alcohol\"], 2)\n",
    "    wine['volatileAcidity*alcohol'] = wine[\"alcohol\"] * wine['volatile acidity']\n",
    "    y = np.array(wine.quality)\n",
    "    X = np.array(wine.drop(\"quality\", axis=1))\n",
    "\n",
    "    columns = np.array(wine.columns)\n",
    "\n",
    "    return X, y, columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "之后我们对数据进行分割和转换，同时这里也给出一个使用lightgbm进行数据保存的例子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:43.911288Z",
     "start_time": "2018-12-18T04:07:43.833128Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<lightgbm.basic.Dataset at 0x7fa28c5ce908>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "# Read wine quality data from file\n",
    "X, y, wineNames = GetNewDataByPandas()\n",
    "\n",
    "# split data to [[0.8,0.2],01]\n",
    "x_train_all, x_predict, y_train_all, y_predict = train_test_split(X, y, test_size=0.10, random_state=100)\n",
    "\n",
    "x_train, x_test, y_train, y_test = train_test_split(x_train_all, y_train_all, test_size=0.2, random_state=100)\n",
    "\n",
    "train_data = lgb.Dataset(data=x_train,label=y_train)\n",
    "test_data = lgb.Dataset(data=x_test,label=y_test)\n",
    "\n",
    "train_data.save_binary(\"/home/fonttian/Data/UCI/wine/wine_lightgbm_train.bin\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同时我们还可以在lightgbm加载数据的时候同时指明类别特征，这对模型精度提升有一定的好处。\n",
    "```\n",
    "train_data = lgb.Dataset(data, label=label, feature_name=['c1', 'c2', 'c3'], categorical_feature=['c3'])\n",
    "```\n",
    "\n",
    "同样的权重也可以通过参数`weight`添加，特征名称则可以通过参数`feature_name`指明"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-17T06:26:48.509501Z",
     "start_time": "2018-12-17T06:26:48.492209Z"
    }
   },
   "source": [
    "## 参数设置\n",
    "\n",
    "模型参数设置如下，具体参数可以参考[官方文档列表](https://lightgbm.readthedocs.io/en/latest/Parameters.html),如果有需要，也可以参考ApacheCN开源社区提供的中文版本，因为目前社区页面正在大改，所以这里只给出[github的地址，这个是不会变的](https://github.com/apachecn/lightgbm-doc-zh)\n",
    "\n",
    " \n",
    "要注意的一点是如果直接使用lgb训练模型，而不是lightgbm提供的sklearn接口，那么我们就需要参数列表中的`objective`这一项来控制模型所处理的问题，默认为回归`regression`，除此之外还有加入l1或者l2惩罚项的回归-`regression_l1`和`regression_l2`。二分类问题请使用`binary`，多分类问题请`multiclass`。其他更多内容请直接参考前面给出的链接。\n",
    " \n",
    " - Booster parameters："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:45.509373Z",
     "start_time": "2018-12-18T04:07:45.506122Z"
    }
   },
   "outputs": [],
   "source": [
    "param = {'num_leaves':31, 'num_trees':100, 'objective':'regression'}\n",
    "param['metric'] = 'rmse'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当然你也可以同时设置两个目标损失函数，不过我们这里就暂时不需要了。\n",
    "```\n",
    "param['metric'] = ['auc', 'binary_logloss']\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练与模型持久化\n",
    "\n",
    "我们可以通过`train`方法来进行训练，然后通过`save_model`方法来存储模型。加载模型则需要使`lgb.Booster()`方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:47.003664Z",
     "start_time": "2018-12-18T04:07:46.826338Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\tvalid_0's rmse: 0.776822\n",
      "[2]\tvalid_0's rmse: 0.745311\n",
      "[3]\tvalid_0's rmse: 0.719537\n",
      "[4]\tvalid_0's rmse: 0.699099\n",
      "[5]\tvalid_0's rmse: 0.682042\n",
      "[6]\tvalid_0's rmse: 0.667974\n",
      "[7]\tvalid_0's rmse: 0.655903\n",
      "[8]\tvalid_0's rmse: 0.644988\n",
      "[9]\tvalid_0's rmse: 0.636119\n",
      "[10]\tvalid_0's rmse: 0.627744\n",
      "[11]\tvalid_0's rmse: 0.622323\n",
      "[12]\tvalid_0's rmse: 0.616088\n",
      "[13]\tvalid_0's rmse: 0.609313\n",
      "[14]\tvalid_0's rmse: 0.604047\n",
      "[15]\tvalid_0's rmse: 0.601131\n",
      "[16]\tvalid_0's rmse: 0.599835\n",
      "[17]\tvalid_0's rmse: 0.597717\n",
      "[18]\tvalid_0's rmse: 0.597067\n",
      "[19]\tvalid_0's rmse: 0.596798\n",
      "[20]\tvalid_0's rmse: 0.594959\n",
      "[21]\tvalid_0's rmse: 0.594611\n",
      "[22]\tvalid_0's rmse: 0.594723\n",
      "[23]\tvalid_0's rmse: 0.595082\n",
      "[24]\tvalid_0's rmse: 0.593598\n",
      "[25]\tvalid_0's rmse: 0.593295\n",
      "[26]\tvalid_0's rmse: 0.592676\n",
      "[27]\tvalid_0's rmse: 0.591653\n",
      "[28]\tvalid_0's rmse: 0.589948\n",
      "[29]\tvalid_0's rmse: 0.589441\n",
      "[30]\tvalid_0's rmse: 0.587821\n",
      "[31]\tvalid_0's rmse: 0.586129\n",
      "[32]\tvalid_0's rmse: 0.585581\n",
      "[33]\tvalid_0's rmse: 0.585773\n",
      "[34]\tvalid_0's rmse: 0.585499\n",
      "[35]\tvalid_0's rmse: 0.584101\n",
      "[36]\tvalid_0's rmse: 0.582139\n",
      "[37]\tvalid_0's rmse: 0.580319\n",
      "[38]\tvalid_0's rmse: 0.578988\n",
      "[39]\tvalid_0's rmse: 0.577562\n",
      "[40]\tvalid_0's rmse: 0.577479\n",
      "[41]\tvalid_0's rmse: 0.576327\n",
      "[42]\tvalid_0's rmse: 0.575712\n",
      "[43]\tvalid_0's rmse: 0.574505\n",
      "[44]\tvalid_0's rmse: 0.572647\n",
      "[45]\tvalid_0's rmse: 0.5728\n",
      "[46]\tvalid_0's rmse: 0.572897\n",
      "[47]\tvalid_0's rmse: 0.572488\n",
      "[48]\tvalid_0's rmse: 0.570868\n",
      "[49]\tvalid_0's rmse: 0.570563\n",
      "[50]\tvalid_0's rmse: 0.571549\n",
      "[51]\tvalid_0's rmse: 0.571709\n",
      "[52]\tvalid_0's rmse: 0.570889\n",
      "[53]\tvalid_0's rmse: 0.571102\n",
      "[54]\tvalid_0's rmse: 0.570533\n",
      "[55]\tvalid_0's rmse: 0.571138\n",
      "[56]\tvalid_0's rmse: 0.570745\n",
      "[57]\tvalid_0's rmse: 0.569937\n",
      "[58]\tvalid_0's rmse: 0.570427\n",
      "[59]\tvalid_0's rmse: 0.57052\n",
      "[60]\tvalid_0's rmse: 0.570113\n",
      "[61]\tvalid_0's rmse: 0.570237\n",
      "[62]\tvalid_0's rmse: 0.569579\n",
      "[63]\tvalid_0's rmse: 0.569251\n",
      "[64]\tvalid_0's rmse: 0.569235\n",
      "[65]\tvalid_0's rmse: 0.569525\n",
      "[66]\tvalid_0's rmse: 0.568762\n",
      "[67]\tvalid_0's rmse: 0.568264\n",
      "[68]\tvalid_0's rmse: 0.568412\n",
      "[69]\tvalid_0's rmse: 0.569076\n",
      "[70]\tvalid_0's rmse: 0.570043\n",
      "[71]\tvalid_0's rmse: 0.569722\n",
      "[72]\tvalid_0's rmse: 0.569831\n",
      "[73]\tvalid_0's rmse: 0.569786\n",
      "[74]\tvalid_0's rmse: 0.569378\n",
      "[75]\tvalid_0's rmse: 0.56967\n",
      "[76]\tvalid_0's rmse: 0.569391\n",
      "[77]\tvalid_0's rmse: 0.569041\n",
      "[78]\tvalid_0's rmse: 0.568359\n",
      "[79]\tvalid_0's rmse: 0.568905\n",
      "[80]\tvalid_0's rmse: 0.569168\n",
      "[81]\tvalid_0's rmse: 0.568847\n",
      "[82]\tvalid_0's rmse: 0.568482\n",
      "[83]\tvalid_0's rmse: 0.568473\n",
      "[84]\tvalid_0's rmse: 0.568237\n",
      "[85]\tvalid_0's rmse: 0.567286\n",
      "[86]\tvalid_0's rmse: 0.567953\n",
      "[87]\tvalid_0's rmse: 0.567875\n",
      "[88]\tvalid_0's rmse: 0.567637\n",
      "[89]\tvalid_0's rmse: 0.567944\n",
      "[90]\tvalid_0's rmse: 0.566821\n",
      "[91]\tvalid_0's rmse: 0.566865\n",
      "[92]\tvalid_0's rmse: 0.567163\n",
      "[93]\tvalid_0's rmse: 0.567749\n",
      "[94]\tvalid_0's rmse: 0.568281\n",
      "[95]\tvalid_0's rmse: 0.567905\n",
      "[96]\tvalid_0's rmse: 0.568127\n",
      "[97]\tvalid_0's rmse: 0.568309\n",
      "[98]\tvalid_0's rmse: 0.567629\n",
      "[99]\tvalid_0's rmse: 0.567858\n",
      "[100]\tvalid_0's rmse: 0.568459\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/fonttian/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py:116: UserWarning: Found `num_trees` in params. Will use it instead of argument\n",
      "  warnings.warn(\"Found `{}` in params. Will use it instead of argument\".format(alias))\n"
     ]
    }
   ],
   "source": [
    "# Training a model requires a parameter list and data set:\n",
    "num_round = 10\n",
    "bst = lgb.train(param, train_data, num_round, valid_sets=[test_data])\n",
    "# After training, the model can be saved:\n",
    "bst.save_model('model.txt')\n",
    "# A saved model can be loaded:\n",
    "bst = lgb.Booster(model_file='model.txt')  #init model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-17T06:48:02.113962Z",
     "start_time": "2018-12-17T06:48:02.109544Z"
    }
   },
   "source": [
    "## 交叉验证"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:48.526410Z",
     "start_time": "2018-12-18T04:07:48.086195Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/fonttian/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py:426: UserWarning: Found `num_trees` in params. Will use it instead of argument\n",
      "  warnings.warn(\"Found `{}` in params. Will use it instead of argument\".format(alias))\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'rmse-mean': [0.7752979247352452,\n",
       "  0.7504569754070065,\n",
       "  0.7291951383700093,\n",
       "  0.711414019324484,\n",
       "  0.6957242006911073,\n",
       "  0.682382826831543,\n",
       "  0.6720057162879787,\n",
       "  0.6636083897270609,\n",
       "  0.6563373159043315,\n",
       "  0.6502373426190309,\n",
       "  0.6451880385380665,\n",
       "  0.6412098621385699,\n",
       "  0.6373447110519322,\n",
       "  0.6337860771302629,\n",
       "  0.6321382536859316,\n",
       "  0.6295741295360276,\n",
       "  0.6284134812486762,\n",
       "  0.6271476794272949,\n",
       "  0.6249402556866194,\n",
       "  0.6227295939608833,\n",
       "  0.6224829225245734,\n",
       "  0.6215067008103283,\n",
       "  0.6209888902332126,\n",
       "  0.6200438264727558,\n",
       "  0.6193509588847586,\n",
       "  0.6193106531460751,\n",
       "  0.6187483235351566,\n",
       "  0.6185828069013477,\n",
       "  0.6175736888205522,\n",
       "  0.616698128051639,\n",
       "  0.6162297019817496,\n",
       "  0.6158322957365135,\n",
       "  0.6163201232392425,\n",
       "  0.6164189976131775,\n",
       "  0.6163109870127503,\n",
       "  0.6162803497687738,\n",
       "  0.6166193347213638,\n",
       "  0.616688261812905,\n",
       "  0.6169295165960578,\n",
       "  0.6168748900584461,\n",
       "  0.6166922031797195,\n",
       "  0.6168348531681817,\n",
       "  0.616220611143578,\n",
       "  0.6167309935705056,\n",
       "  0.6170087455190248,\n",
       "  0.6167049151979855,\n",
       "  0.6167362758995959,\n",
       "  0.6164371763370667,\n",
       "  0.6158342336362306,\n",
       "  0.6162342284921991,\n",
       "  0.6164050053381753,\n",
       "  0.6168768198263368,\n",
       "  0.6173011821111447,\n",
       "  0.6170733987835665,\n",
       "  0.6171773838458237,\n",
       "  0.6174168380721522,\n",
       "  0.6179050390042734,\n",
       "  0.6183106385275262,\n",
       "  0.6187308005843023,\n",
       "  0.6192588776785244,\n",
       "  0.6193631839301603,\n",
       "  0.6190675119966225,\n",
       "  0.6190499747790911,\n",
       "  0.6192636491897701,\n",
       "  0.6194732795951713,\n",
       "  0.6199795293520237,\n",
       "  0.6203123477254338,\n",
       "  0.619834445065732,\n",
       "  0.6202203337723841,\n",
       "  0.6206297900601851,\n",
       "  0.6205499824760786,\n",
       "  0.620661834148043,\n",
       "  0.6206926768482366,\n",
       "  0.6210059172018199,\n",
       "  0.6211326235864757,\n",
       "  0.6210662454576644,\n",
       "  0.6212339770296589,\n",
       "  0.6213655632403076,\n",
       "  0.6216149788726819,\n",
       "  0.6217868127180923,\n",
       "  0.6218182584674657,\n",
       "  0.6222212581627513,\n",
       "  0.6220386254338169,\n",
       "  0.6222260279284186,\n",
       "  0.6225157621278351,\n",
       "  0.6223799562645815,\n",
       "  0.6221679550425528,\n",
       "  0.6224065354573738,\n",
       "  0.6224959332300715,\n",
       "  0.6226524893217592,\n",
       "  0.622781243059824,\n",
       "  0.6228405059127796,\n",
       "  0.6229427433271868,\n",
       "  0.6230589567701139,\n",
       "  0.623185906492709,\n",
       "  0.6235742455058648,\n",
       "  0.623298732674607,\n",
       "  0.6235781963060918,\n",
       "  0.6234720880367247,\n",
       "  0.623599205645095],\n",
       " 'rmse-stdv': [0.012997665079702762,\n",
       "  0.010725846259504221,\n",
       "  0.00917602290462625,\n",
       "  0.009474069938943891,\n",
       "  0.00893079934316806,\n",
       "  0.008858714373995491,\n",
       "  0.008886129913338791,\n",
       "  0.008732150272380052,\n",
       "  0.009340018297460895,\n",
       "  0.009984258656280354,\n",
       "  0.009813785896225146,\n",
       "  0.010398158720048943,\n",
       "  0.010763735050913138,\n",
       "  0.011718819806154539,\n",
       "  0.011489049477663108,\n",
       "  0.012140014000793519,\n",
       "  0.012981183616055884,\n",
       "  0.01383315205616935,\n",
       "  0.015128231920498112,\n",
       "  0.015553857723701662,\n",
       "  0.01608053368146606,\n",
       "  0.017241140223277714,\n",
       "  0.017748596722364427,\n",
       "  0.018046571468908798,\n",
       "  0.0186116640824644,\n",
       "  0.019349048720057695,\n",
       "  0.021089697184363373,\n",
       "  0.021713288133987086,\n",
       "  0.022572893484405814,\n",
       "  0.022728162518712763,\n",
       "  0.023256428678083994,\n",
       "  0.02396647921855839,\n",
       "  0.024782282756982993,\n",
       "  0.025008894722644483,\n",
       "  0.025587585101080893,\n",
       "  0.02565884605071845,\n",
       "  0.025616098126429985,\n",
       "  0.026499177547935075,\n",
       "  0.02715706554012009,\n",
       "  0.026746237008381885,\n",
       "  0.0264892551095745,\n",
       "  0.026319723495441365,\n",
       "  0.026414376798483,\n",
       "  0.02639098966402332,\n",
       "  0.026693585959410657,\n",
       "  0.0272786597453678,\n",
       "  0.027372268895761907,\n",
       "  0.027595112670965642,\n",
       "  0.02772936297135302,\n",
       "  0.027730104176255856,\n",
       "  0.02760552100243517,\n",
       "  0.027435262884109534,\n",
       "  0.027805367598824442,\n",
       "  0.02787199148000504,\n",
       "  0.027718049662293263,\n",
       "  0.02710355241865112,\n",
       "  0.027550983954062767,\n",
       "  0.028034578723639403,\n",
       "  0.028308065324059217,\n",
       "  0.02852726693801683,\n",
       "  0.02899739761337874,\n",
       "  0.02930675609549474,\n",
       "  0.029384983743157463,\n",
       "  0.029515260630459065,\n",
       "  0.029586643485055044,\n",
       "  0.029929945510165687,\n",
       "  0.030548046606710167,\n",
       "  0.030147751707932588,\n",
       "  0.03039595846519692,\n",
       "  0.030507411051122667,\n",
       "  0.030466025227525798,\n",
       "  0.030921723361892,\n",
       "  0.031246132589905176,\n",
       "  0.03147592576121973,\n",
       "  0.03130888340168957,\n",
       "  0.03153889954757624,\n",
       "  0.03169714895120428,\n",
       "  0.03163912858999445,\n",
       "  0.03162980656956592,\n",
       "  0.03147748671876028,\n",
       "  0.03140629529402534,\n",
       "  0.0316424486190176,\n",
       "  0.0318049427006125,\n",
       "  0.03173095002734147,\n",
       "  0.0317269787061155,\n",
       "  0.031994245625869965,\n",
       "  0.03252983730455276,\n",
       "  0.03248305213946924,\n",
       "  0.03245367100556561,\n",
       "  0.03216900785907104,\n",
       "  0.03217195519618935,\n",
       "  0.03209784242315717,\n",
       "  0.03232556845784091,\n",
       "  0.032338956319005366,\n",
       "  0.03198684985392431,\n",
       "  0.03193506171564571,\n",
       "  0.03186771760590264,\n",
       "  0.03207466173953701,\n",
       "  0.031637346872895655,\n",
       "  0.03154536582175296]}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Training with 5-fold CV:\n",
    "num_round = 10\n",
    "lgb.cv(param, train_data, num_round, nfold=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 早停\n",
    "\n",
    "当我们拥有评价数据集的时候也就可以使用早停来获取更好的模型效果。评价数据集需要通过`valid_sets`来设置，早停阈值则通过`early_stopping_rounds`，这一参数在`cv`方法与`train`方法中都是通用的。\n",
    "\n",
    "这个方法在最小化的评价函数如（L2，log loss，等）和最大化的评价函数如（NDCG，AUC，等）上都是有效的。不过要注意的是，如果你使用不止一个评价函数，他们中的每一个都将被用于早停。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:49.329431Z",
     "start_time": "2018-12-18T04:07:49.201342Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\tvalid_0's rmse: 0.776822\n",
      "Training until validation scores don't improve for 10 rounds.\n",
      "[2]\tvalid_0's rmse: 0.745311\n",
      "[3]\tvalid_0's rmse: 0.719537\n",
      "[4]\tvalid_0's rmse: 0.699099\n",
      "[5]\tvalid_0's rmse: 0.682042\n",
      "[6]\tvalid_0's rmse: 0.667974\n",
      "[7]\tvalid_0's rmse: 0.655903\n",
      "[8]\tvalid_0's rmse: 0.644988\n",
      "[9]\tvalid_0's rmse: 0.636119\n",
      "[10]\tvalid_0's rmse: 0.627744\n",
      "[11]\tvalid_0's rmse: 0.622323\n",
      "[12]\tvalid_0's rmse: 0.616088\n",
      "[13]\tvalid_0's rmse: 0.609313\n",
      "[14]\tvalid_0's rmse: 0.604047\n",
      "[15]\tvalid_0's rmse: 0.601131\n",
      "[16]\tvalid_0's rmse: 0.599835\n",
      "[17]\tvalid_0's rmse: 0.597717\n",
      "[18]\tvalid_0's rmse: 0.597067\n",
      "[19]\tvalid_0's rmse: 0.596798\n",
      "[20]\tvalid_0's rmse: 0.594959\n",
      "[21]\tvalid_0's rmse: 0.594611\n",
      "[22]\tvalid_0's rmse: 0.594723\n",
      "[23]\tvalid_0's rmse: 0.595082\n",
      "[24]\tvalid_0's rmse: 0.593598\n",
      "[25]\tvalid_0's rmse: 0.593295\n",
      "[26]\tvalid_0's rmse: 0.592676\n",
      "[27]\tvalid_0's rmse: 0.591653\n",
      "[28]\tvalid_0's rmse: 0.589948\n",
      "[29]\tvalid_0's rmse: 0.589441\n",
      "[30]\tvalid_0's rmse: 0.587821\n",
      "[31]\tvalid_0's rmse: 0.586129\n",
      "[32]\tvalid_0's rmse: 0.585581\n",
      "[33]\tvalid_0's rmse: 0.585773\n",
      "[34]\tvalid_0's rmse: 0.585499\n",
      "[35]\tvalid_0's rmse: 0.584101\n",
      "[36]\tvalid_0's rmse: 0.582139\n",
      "[37]\tvalid_0's rmse: 0.580319\n",
      "[38]\tvalid_0's rmse: 0.578988\n",
      "[39]\tvalid_0's rmse: 0.577562\n",
      "[40]\tvalid_0's rmse: 0.577479\n",
      "[41]\tvalid_0's rmse: 0.576327\n",
      "[42]\tvalid_0's rmse: 0.575712\n",
      "[43]\tvalid_0's rmse: 0.574505\n",
      "[44]\tvalid_0's rmse: 0.572647\n",
      "[45]\tvalid_0's rmse: 0.5728\n",
      "[46]\tvalid_0's rmse: 0.572897\n",
      "[47]\tvalid_0's rmse: 0.572488\n",
      "[48]\tvalid_0's rmse: 0.570868\n",
      "[49]\tvalid_0's rmse: 0.570563\n",
      "[50]\tvalid_0's rmse: 0.571549\n",
      "[51]\tvalid_0's rmse: 0.571709\n",
      "[52]\tvalid_0's rmse: 0.570889\n",
      "[53]\tvalid_0's rmse: 0.571102\n",
      "[54]\tvalid_0's rmse: 0.570533\n",
      "[55]\tvalid_0's rmse: 0.571138\n",
      "[56]\tvalid_0's rmse: 0.570745\n",
      "[57]\tvalid_0's rmse: 0.569937\n",
      "[58]\tvalid_0's rmse: 0.570427\n",
      "[59]\tvalid_0's rmse: 0.57052\n",
      "[60]\tvalid_0's rmse: 0.570113\n",
      "[61]\tvalid_0's rmse: 0.570237\n",
      "[62]\tvalid_0's rmse: 0.569579\n",
      "[63]\tvalid_0's rmse: 0.569251\n",
      "[64]\tvalid_0's rmse: 0.569235\n",
      "[65]\tvalid_0's rmse: 0.569525\n",
      "[66]\tvalid_0's rmse: 0.568762\n",
      "[67]\tvalid_0's rmse: 0.568264\n",
      "[68]\tvalid_0's rmse: 0.568412\n",
      "[69]\tvalid_0's rmse: 0.569076\n",
      "[70]\tvalid_0's rmse: 0.570043\n",
      "[71]\tvalid_0's rmse: 0.569722\n",
      "[72]\tvalid_0's rmse: 0.569831\n",
      "[73]\tvalid_0's rmse: 0.569786\n",
      "[74]\tvalid_0's rmse: 0.569378\n",
      "[75]\tvalid_0's rmse: 0.56967\n",
      "[76]\tvalid_0's rmse: 0.569391\n",
      "[77]\tvalid_0's rmse: 0.569041\n",
      "Early stopping, best iteration is:\n",
      "[67]\tvalid_0's rmse: 0.568264\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/fonttian/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py:116: UserWarning: Found `num_trees` in params. Will use it instead of argument\n",
      "  warnings.warn(\"Found `{}` in params. Will use it instead of argument\".format(alias))\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<lightgbm.basic.Booster at 0x7fa2530a52e8>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bst = lgb.train(param, train_data, num_round, valid_sets=[test_data], early_stopping_rounds=10)\n",
    "bst.save_model('model.txt', num_iteration=bst.best_iteration)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-17T07:06:49.367328Z",
     "start_time": "2018-12-17T07:06:49.055502Z"
    }
   },
   "source": [
    "在cv中使用方法相同，同时也要保证最起码需要一个评价函数，多个评价函数时，也会同时使用其中的每一个用于早停。而交叉验证中的结果则会使用评级函数历史上最后一个，也就是只要添加了`early_stopping_rounds`参数，那么交叉验证也就会默认会使用早停后的结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:11:50.991700Z",
     "start_time": "2018-12-18T04:11:50.662850Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/fonttian/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py:426: UserWarning: Found `num_trees` in params. Will use it instead of argument\n",
      "  warnings.warn(\"Found `{}` in params. Will use it instead of argument\".format(alias))\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "min rmse-mean of the lightgbm's Cross-validation: 0.6158322957365135\n"
     ]
    }
   ],
   "source": [
    "# Use earlystoppping and training with 5-fold CV:\n",
    "num_round = 10\n",
    "res = lgb.cv(param, train_data, num_round, nfold=5,early_stopping_rounds=10)\n",
    "\n",
    "print(\"min rmse-mean of the lightgbm's Cross-validation:\", min(res['rmse-mean']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 预测\n",
    "\n",
    "如果需要在预测中使用早停，请使用`num_iteration=bst.best_iteration`。效果如下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-18T04:07:52.940452Z",
     "start_time": "2018-12-18T04:07:52.925566Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSE of predict : 0.5956830546692128\n"
     ]
    }
   ],
   "source": [
    "ypred = bst.predict(x_predict, num_iteration=bst.best_iteration)\n",
    "\n",
    "from sklearn.metrics import mean_squared_error\n",
    "RMSE = np.sqrt(mean_squared_error(y_predict, ypred))\n",
    "\n",
    "print(\"RMSE of predict :\",RMSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参考与学习推荐\n",
    " \n",
    " - [lightgbm官方文档](https://github.com/Microsoft/LightGBM)\n",
    " - [ApacheCN维护的的中文文档github的地址，这个是不会变的](https://github.com/apachecn/lightgbm-doc-zh)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.6"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
